System and method for identifying computer users having files with common attributes

ABSTRACT

A system and a method for identifying computer users having files with common attributes are provided. The method includes generating a first table having a set of attributes for each file in a first set of files associated with a first computer user. The set of attributes for each file in the first set of files have a plurality of attribute types. The method further includes generating a second table having a set of attributes for each file in a second set of files associated with a second computer user. The set of attributes for each file in the second set of files have the plurality of attribute types. The method further includes generating a similarity table by comparing each set of attributes in the first table with each set of attributes in the second table, utilizing a predetermined similarity metric, and determining whether the first and second computer users have at least one file with common attributes, based on data in the similarity table.

FIELD OF INVENTION

This application relates to a system and a method for identifyingcomputer users having files with common attributes.

BACKGROUND OF INVENTION

A growing problem in the realm of information technology is managing,organizing, finding, and making use of electronic data available withina business organization. Though data may exist within the businessorganization, it is often difficult to locate when needed. Consequently,much effort is employed to manage and organize information so that itmay be easily found and used. Although search technologies have made iteasier to find electronically available information if it has alreadybeen structured and organized for public browsing on the Internet,finding information within a private computer network (intranet) remainsdifficult. For example, searching an intranet gives limited results inpart because content creators are insufficiently or improperly motivatedso make their content “interesting” (i.e., rich with links to relateddocuments), or attractive. Consequently, few viewers are in turnmotivated to link to such content.

Moreover, intranet web pages are ideally designed to provide informationorganized in an efficient, hierarchical structure, and do notnecessarily aim to connect information to other information.Consequently, for many intranet searches only use content page containsthe sought-for data, and few (sometimes zero) links point to thatintranet web page from other pages. Making an intranet search even moredifficult, intranet files often lack identifying characteristics to makethe files stand out in a particular search.

Furthermore, some data available in an intranet is notsearch-engine-friendly or was never intended to be viewed directly. Forexample, data may be stored in locations that can't easily be crawled bya “web spider”, or data may be intended only to form a portion of alarger set of data.

Members of a business organization may wish to identify others in theorganization having common interests and ideas, as suggested by theirmaintenance of identical or similar files. However, current searchschemes generally provide results only for purposely published files.

Accordingly the inventors herein have recognized a need for an improvedsystem and method for identifying computer users having files withcommon attributes.

SUMMARY OF INVENTION

A method for identifying computer users having files with commonattributes in accordance with an exemplary embodiment is provided. Themethod includes generating a first table having a set of attributes foreach file in a first set of files associated with a first computer user.The set of attributes for each file in the first set of files have aplurality of attribute types. The method further includes generating asecond table having a set of attributes for each file in a second set offiles associated with a second computer user. The set of attributes foreach file in the second set of files have the plurality of attributetypes. The method further includes generating a similarity table bycomparing each set of attributes in the first table with each set ofattributes in the second table, utilizing a predetermined similaritymetric. The method further includes determining whether the first andsecond computer users have at least one file with common attributes,based on data in the similarity table.

A system for identifying computer users having files with commonattributes in accordance with another exemplary embodiment is provided.The system includes first and second computers operably communicatingwith one another. The system further includes a display device operablycommunicating with the first computer. The first computer is configuredto generate a first table having a set of attributes for each file in afirst set of files associated with a first computer user. The set ofattributes for each file in the first set of files have a plurality ofattribute types. The second computer is further configured to generate asecond table having a set of attributes for each file in a second set offiles associated with a second computer user. The set of attributes foreach file in the second set of files have the plurality of attributetypes. The first computer is further configured to generate a similaritytable by comparing each set of attributes in the first table with eachset of attributes in the second table, utilizing a predeterminedsimilarity metric. The first computer is further configured to determinewhether the first and second computer users have at least one file withcommon attributes, based on data in the similarity table. The firstcomputer is further configured to display a user identifier associatedwith at least one of the first and second computer users on the displaydevice when the first and second computer users have at least one filewith common attributes.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram of a computer network for identifying computerusers having files with common attribute in accordance with an exemplaryembodiment.

FIG. 2 is a schematic of a set of files associated with a computer userof the computer network of FIG. 1;

FIG. 3 is a schematic of a table of attributes associated with the setof files illustrated in FIG. 2;

FIG. 4 is a schematic of a set of files associated with a secondcomputer user of the computer network of FIG. 1;

FIG. 5 is a schematic of a table of attributes associated with the setof files illustrated in FIG. 4;

FIG. 6 is a schematic of a similarity table derived from the tables ofattributes in FIG. 3 and FIG. 5;

FIG. 7 is a flow chart of a method for identifying computer users havingfiles with common attributes in accordance with another exemplaryembodiment;

FIG. 8 is a schematic of an exemplary set of files associated with acomputer user at an first time;

FIG. 9 is a schematic of a table of attributes associated with the setof files shown in FIG. 8;

FIG. 10 is a schematic of an exemplary set of files associated with thecomputer user of FIG. 8 at a time later than the first time;

FIG. 11 is a schematic of a table of attributes associated with the setof files shown in FIG. 10;

FIG. 12 is a schematic of a table having sets of attributes that are inthe table of attributes of FIG. 11 and that are not in the table ofattributes of FIG. 9;

FIG. 13 is a schematic of an exemplary set of files associated with adifferent computer user at a first time;

FIG. 14 is a schematic of a table of attributes associated with the setof files shown in FIG. 13;

FIG. 15 is a schematic of an exemplary set of files associated with thedifferent computer user at a time later than the first time;

FIG. 16 is a schematic of a table of attributes associated with the setof files shown in FIG. 15;

FIG. 17 is a schematic of a table having sets of attributes that are inthe table of attributes of FIG. 16 and that are not in the table ofattributes of FIG. 14;

FIG. 18 is a schematic of a similarity table derived from the tables ofattributes in FIG. 12 and FIG. 17;

FIG. 19 is a schematic of an exemplary search query entered by a searchuser in accordance with the similarity table of FIG. 6;

FIG. 20 is a schematic of an exemplary search result produced inaccordance with the search query of FIG. 19; and

FIG. 21 is a flow chart of a method for identifying intranet usershaving files with common attributes in accordance with another exemplaryembodiment.

DETAILED DESCRIPTION OF EMBODIMENTS

Referring to FIG. 1, a computer network 9 for allowing theidentification of computer users having files with common attributes isillustrated. The computer network 9 includes a computer 10 associatedwith a first computer user and a computer 20 associated with a secondcomputer user. In an exemplary embodiment, the computer network 9comprises an Intranet which is a private network that uses Internetsoftware and Internet standards. In the exemplary embodiment, thecomputers 10, 20 have at least central processing units 12, 22, datastorage/RAM/ROM memory 14, 24, displays 16, 26, keyboards 18, 28, andnetwork interfaces 19, 29, respectively. Computers 10 and 20 are in datacommunication with network medium 40. Network medium 40 may of courseinclude one or more routers, switches, data hubs, or other datacommunication equipment, and may facilitate wired and/or wirelesscommunication. Optionally, computer network 9 can further includeadditional computers, for example computer 30, which may be associatedwith additional computer users in the system. An additional computersuch as computer 30 has at least a central processing unit 32, datastorage/RAM/ROM memory 34, a display 36, a keyboard 38, and a networkinterface 39.

Referring to FIGS. 2-6, exemplary files and tables utilized the by thecomputer network 9 for allowing the identification of computer usershaving files with common attributes are illustrated. In particular, aset of files 50 associated with the first computer user is illustrated.A set of files 60, associated with the second computer user, is alsoillustrated. In the exemplary embodiment, the set of files 50 resides inmemory 14 of computer 10, and set of files 60 resides in memory 24 ofcomputer 20. It is, of course, recognized that sets of files associatedwith different computer users may optionally be located in a sharedmemory of a single computer or shared network drive. The sets of filesshown are meant only to be a representative example of files associatedwith computers—actual files will, of course, vary in name, location,size, etc.

According to an exemplary embodiment, a table 52 is generated, includinga set of attributes for each file in the set of files 50. A table 62 isalso generated, including a set of attributes for each file in the setof files 60. The word “table” herein, without limiting its scope, mayinclude a database, index, list, or other equivalent collection orcollections of data.

Tables 52 and 62 include, for each file associated respectively withsets of files 50, 60, a set of attributes including username, file ordirectory name, and checksum. It is of course recognized that the table52, 62 may include an alternative or different set of attributes foreach file in the sets of files 50, 60. For example, and without meaningto limit the scope of attributes associated with each file, a set ofattributes may include an associated user name, a file or directoryname, a checksum value, a file location, a file size, a file creationdate, a file modification date, a file access date, keywords from thefile, and/or names of files located in the same directory.

In one exemplary embodiment, a similarity table 70 is generatedutilizing table 52 and table 62 and a predetermined similarity metric.Similarity table 70 is configured to indicate an amount of similaritybetween files represented in table 52 and files represented in table 62.In one non-limiting example, the similarity metric utilizes a checksumvalue for each file represented in first table 52 with checksum valuesfor each file represented in second table 62 to generate similaritytable 70. The similarity table may then be used to indicate filesrepresented in each table 52, 62 that have an identical checksum—withoutrequiring the files to have any other attributes in common (i.e.,multiple files need not share an identical filename or have an identicalchecksum). For example, similarity table 70 shows a column of uniquechecksums found in tables 52 and 62. An indication of computer usershaving a file associated with each unique checksum is also representedin the similarity table 70. Similarity table 70 may optionally indicatewhether multiple files having identical checksums are similarly named,located, owned, etc.—depending on the attributes stored in tables 52 and62. In this manner a computer user may, for example, locate copies of aparticular file saved under different names, as is the case for “File X”of user 1 and “File D” of user 2. Additionally, or alternatively, thesimilarity table may be used indicate the names or usernames of computerusers having files with attributes in common, e.g., files havingidentical checksums.

In an alternative embodiment, the similarity metric is a collection ofchecksums of portions of first and second files, and if there are apredetermined percentage of portions of the first and second files thatare similar based on the associated checksums, the first and secondfiles would be identified as being similar.

Referring to FIG. 7, a method for identifying computer users havingfiles with common attributes in accordance with an exemplary embodimentis illustrated. The method can be implemented utilizing the system 9described above.

At step 100, the computer 20 generates a first table 52 having a set ofattributes for each file in a first set of files associated with a firstcomputer user. The set of attributes for each file in the first set offiles have a plurality of attribute types.

At step 102, the computer 20 generates a second table 62 having a set ofattributes for each file in a second set of files associated with asecond computer user. The act of attributes for each file in the secondset of files have the plurality of attribute types.

At step 103, the computer 10 generates a similarity table 70 bycomparing each set of attributes in the first table 52 with each set ofattributes in the second table 62, utilizing a predetermined similaritymetric.

At step 104, the computer 10 determines whether the first and secondcomputer users have at least one file with common attributes, based ondata in the similarity table 70.

At step 105, the computer 10 displays the names of the first and secondcomputer users on a display device 16 when the first and second computerusers have at least one file with common attributes. After step 105, themethod is exited.

Referring now to FIGS. 8-12, other exemplary files and tables that canbe utilized by the computer network 9 for allowing the identification ofcomputer users having files with common attributes are illustrated. Forpurposes of discussion, it should be noted that tables of attributes forfiles associated with different computer users may be generated in a wayso as to eliminate or reduce analysis of files that are rarely accesseddirectly by an average computer user, such as system or applicationfiles. In this embodiment, a table 132 is generated at a first time toinclude attributes for each file or directory in a set of files 130associated with a first computer user. In a non-limiting example, thethird table 132 is generated before the first computer user has createdor modified files in the set of files 130. A table 136 is generated at asecond, later, time and also includes attributes for each file and/ordirectory in set of files 134 associated with the first computer user.In one non-limiting example, set of files 134 encompasses the sametop-level file structure or directory hierarchy as set of files 130 suchthat any file in the directory structure that is unchanged between thegeneration of table 132 and the generation of table 136 will result inan identical set of attributes in both table 132 and table 136. It is,of course, recognized that the file structure associated with aparticular set of files may take forms different from the structureillustrated in FIGS. 8-18, and that a computer user will likely havefiles named or located differently than illustrated, such as directorystructures used in operating systems such as MAC OS, UNIX, LINUX, andWINDOWS.

Table 138 is generated from tables 132 and 136 and is configured toinclude only sets of attributes from table 136 that are not present intable 132. For example, since “System files” appears in both table 132and 136, it does not appear in the temporal differencing table 138.Also, as illustrated in FIGS. 8-9, at a first time set of files 130 mayinclude a file named “File A”, located at the top level of a directorystructure (C:\) associated with the first user. The attributesassociated with “File A” in table 132, in this example, include theusername, filename, and file location within the directory structure. Atthe time table 136 is generated, “File A” exists in a different locationthan it did when table 132 was generated. Thus the attributes for “FileA” are different in table 136 than in table 132. That differencerequires attributes for “File A”, to be listed in “temporal difference”table 138.

Referring to FIGS. 13-17, tables 142 and 146 are derived respectivelyfrom sets of files 140 and 144 at third and fourth times, respectively.The generation of table 142 at a third time preferably occurs before thesecond user has created or modified files in the set of files 140associated with the second computer user. It is however recognized thatfiles may be created or modified in set of files 140, associated withthe second user, prior to generation 142. The generation of table 146 ispreferably occurs later in time than the generation of table 142, thoughnot limited to be so, and the generation of tables 142 and 146 maycorrespond in time respectively with the generation of tables 132 and136. A table 148 is generated from table 142 and table 146 to includeset of attributes from table 146 that are not represented in table 142.

A similarity table 150, shown in FIG. 18, is generated utilizing tables138 and 148 and a predetermined similarity metric. The similarity metricis used to compare each set of attributes in table 138 against each setof attributes in table 148 to determine a degree of similarity of filesrepresented by each set of attributes.

In one non-limiting example, the similarity metric utilizes filename andfile location attributes from tables 138 and 148 to determine an amountof similarity between files. For example, FIG. 18, shows attributes for“File C”, which, according to table 138 is located at “C:\Project B” forthe first user, and according to table 148 is located at “C:\ProjectB:\Project B-1” for the second user. Although the files have a commonfilename, they are differently located, resulting in, for example, anamount of similarity less than 100%, based on a scale used by thesimilarity metric. It is, of course, recognized that a variety of scalesand similarity determinations may be used by a particular similaritymetric.

Referring to FIGS. 19 and 20, it should be noted that similarity table150 may be used to select attributes of files that match, within apredetermined or selectable amount of similarity, a search attributeentered by a search user. For example, as shown in FIG. 19, a searchattribute is received form a search user. According to a non-limitingembodiment of the invention, the search attribute is compared withattributes in the similarity table 150 to produce a result attribute orset of result attributes corresponding with the search attribute. Forexample, a search user may enter the term “File A”. The filename iscompared with attributes in similarity table 150 and a set of resultattributes is calculated and displayed, including, for example, a listof filenames 164 that are related to the search request “File A.” In oneexemplary embodiment, the results obtained from the search (shown inResults 1 in FIG. 20) are “File X”, “File Y”, and “File Z” which arefiles generated by the search user that are related to the file named“File A.” For example, the relationship between the files can be thatthe “File A”, “File X”, “File Y”, and “File Z” are stored in the samedirectory. Further, a list of usernames 166 associated with other userswho have a file with the filename “File A” is determined and displayed.In one exemplary embodiment, the list of usernames 166 includes a usernamed “User 2” who has a file named “File A” (shown as Results 2 in FIG.20). It is, of course, recognized that the search results produced willdepend on the type of attributes stored in similarity table 150. It isanticipated that search results may be ordered by amount of similarityto the search term, as well as by other sorting algorithms known tothose skilled in the art.

Referring now to FIG. 21, a method for identifying computer users havingfiles with common attributes in accordance with another exemplaryembodiment is illustrated. The method can be implemented utilizing thesystem 9 described above.

At step 200, the computers 10 and 20 generate the first table 132 andthe second table 142, respectively, comprising a set of attributes foreach of first and second sets of files 130, 140, respectively,associated with first and second users, respectively.

At step 202, after generating the first and second tables 132, 142, thecomputers 10 and 20 generate third and fourth tables 136, 146 comprisinga set of attributes for each of a third and fourth set of files,respectively, associated with the first and second computer users,respectively. It should be noted that table 136 can be generated at adifferent time than generation of table 132. Further, table 146 can begenerated at a time different than generation of table 142.

At step 204, computers 10 and 120 generate first and second differencetables 138, 148, respectively, associated with the first and secondcomputers users, respectively. The difference table 138 includes a setof attributes from table 136 that are not included identically in table132. The difference table 148 includes a set of attributes from table146 that are not included identically in table 142.

At step 206, the computer 10 generates a similarity table 150 based onthe first and second difference tables 138, 148, utilizing a similaritymetric. In particular, the computer 10 compares set of attributes intable 138 with sets of attributes in table 148, utilizing apredetermined similarity metric to generate the similarity table 150.

At step 208, the computer 10 receives at least one search attribute froma search user. The search user can be either the first user or thesecond user. The search attribute corresponds to an attribute typecontained in similarity table 150.

At step 210, the computer 10 displays one or more filename(s) associatedwith each set of attributes in the search-user's difference table on thedisplay device 16 wherein the set of attributes corresponds with thesearch attribute.

At step 212, the computer displays one or more username(s) associatedwith each set of attributes in the first or second difference tables onthe display device 16 wherein the set of attributes corresponds with thesearch attribute. After step 212, the method is exited.

It should be noted that in an alternate embodiment, an inferredrelationship metric could be utilized to find files of first and secondusers having common attributes. An inferred relationship metric is ametric associated with an organization of files. For example, asinferred relationship metric could be a grouping of files in a folder.Further, for example, if User 1 and User 2 have “File Z” in common, thefact that User 2 also places “File C” and “File H” in the same folder as“File Z” may suggest an inferred relationship between “File C, ” “FileH,” and “File Z.”

It is of course appreciated that the foregoing embodiments may beextended without limitation to generate table and results associatedwith sets of files associated with more than two computer users within acomputer network. It should be noted that in an alternative embodiment,the foregoing tables and results can be determined utilizing a thirdexternal computer or computer server, communicating with first andsecond computers that store the files associated with first and secondcomputer users, respectively.

The capabilities of the present invention can be implemented insoftware, firmware, hardware or some combination thereof. As oneexample, one or more aspects of the present invention can be included inan article of manufacture (e.g., one or more computer program products)having, for instance, computer usable media. The media has embodiedtherein, for instance, computer readable program code means forproviding and facilitating the capabilities of the present invention.The article of manufacture can be included as a part of a computersystem or sold separately. Additionally, at least one program storagedevice readable by a machine, tangibly embodying at least one program ofinstructions executable by the machine to perform the capabilities ofthe present invention can be provided.

The flow diagram depicted herein are just examples. There may be manyvariations to these diagrams or the steps (or operations) describedtherein without departing from the spirit of the invention. Forinstance, the steps may be performed in a differing order, or steps maybe added, deleted or modified. All of these variations are considered apart of the claimed invention.

The system and methods for identifying computer users having files withcommon attributes provide a substantial advantage over other systems andmethods. In particular, the system and methods provide a technicaleffect of enabling intranet users to find file resources in an intranetwhich are not otherwise sufficiently available, utilizing a similaritytable which relates attributes of a file to the attributes of anotherfile. Another effect of the system and the methods are that computerusers are able to identify other computer users having similar files.

While the invention is described with reference to an exemplaryembodiment, it will be understood by those skilled in the art thatvarious changes may be made and equivalent elements may be substitutedfor elements thereof without departing from the scope of the invention.In addition, many modifications may be made to the teachings of theinvention to adapt to a particular situation without departing from thescope thereof. Therefore, is intended that the invention not be limitedthe embodiment disclosed for carrying out this invention, but that theinvention includes all embodiments falling with the scope of theintended claims. Moreover, the use of the term's first, second, etc.does not denote any order of importance, but rather the term's first,second, etc. are used to distinguish one element from another.

1. A method for identifying computer users having files with commonattributes, comprising: generating a first table having a set ofattributes for each file in a first set of files associated with a firstcomputer user, the set of attributes for each file in the first set offiles having a plurality of attribute types; generating a second tablehaving a set of attributes for each file in a second set of filesassociated with a second computer user, the set of attributes for eachfile in the second set of files having the plurality of attribute types;generating a similarity table by comparing each set of attributes in thefirst table with each set of attributes in the second table, utilizing apredetermined similarity metric; and determining whether the first andsecond computer users have at least one file with common attributes,based on data in the similarity table.
 2. The method of claim 1, whereinthe first and second sets of files are stored on first and secondcomputers, respectively.
 3. The method of claim 1, wherein the set ofattributes in the first table includes at least one of a user name, afilename, a file size, a file type, a file creation date, a filemodification date, a file location, a checksum value associated with afile, and a collection of checksum values associated with portions of afile.
 4. The method of claim 1, wherein the similarity metric is basedon a quantity of checksum values in the first table that correspond tochecksum values in the second table.
 5. The method of claim 1, furthercomprising: generating a third table at a first time having a set ofattributes for each file in a third set of files associated with thefirst computer user, the set of attributes for each file in the thirdset of files having the plurality of attribute types; generating afourth table at a second time after the first time having a set ofattributes for each file in a fourth set of files associated with thefirst computer user, the set of attributes for each file in the fourthset of files having the plurality of attribute types; generating thefirst table having only sets of attributes contained in the fourth tablethat are not contained in the third table; generating a fifth table at athird time having a set of attributes for each file in a fifth set offiles associated with the second computer user, the set of attributesfor each file in the fifth set of files having the plurality ofattribute types; generating a sixth table at a fourth time after thethird time having a set of attributes for each file in a sixth set offiles associated with the second computer user, the set of attributesfor each file in the sixth set of files having the plurality ofattribute types; and generating the second table having only sets ofattributes contained in the sixth table that are not contained in thefifth table.
 6. The method of claim 1, further comprising: receiving afirst file attribute that corresponds with a first file associated withthe first computer user; and indicating a name of the second computeruser associated with the second set of files wherein at least one filein the second set of files corresponds to the first file, utilizing thesimilarity table.
 7. The method of claim 6, further comprisingindicating one or more related files that are associated with the secondcomputer user, wherein the related files are determined to correspond tothe first file by utilizing a predetermined inferred relationshipmetric.
 8. A system for identifying computer users having files withcommon attributes, comprising: first and second computers operablycommunicating with one another; a display device operably communicatingwith the first computer, the first computer configured to generate afirst table having a set of attributes for each file in a first set offiles associated with a first computer user, the set of attributes foreach file in the first set of files having a plurality of attributetypes, the second computer further configured to generate a second tablehaving a set of attributes for each file in a second set of filesassociated with a second computer user, the set of attributes for eachfile in the second set of files having the plurality of attribute types,the first computer further configured to generate a similarity table bycomparing each set of attributes in the first table with each set ofattributes in the second table, utilizing a predetermined similaritymetric, the first computer further configured to determine whether thefirst and second computer users have at least one file with commonattributes, based on data in the similarity table, the first computerfurther configured to display a user identifier associated with at leastone of the first and second computer users on the display device whenthe first and second computer users have at least one file with commonattributes.
 9. The system of claim 8, wherein the first computer isfurther configured to display a file name of the at least one file withcommon attributes on the display device.